Local features-based script recognition from printed bilingual document images
Identifieur interne : 000775 ( Main/Exploration ); précédent : 000774; suivant : 000776Local features-based script recognition from printed bilingual document images
Auteurs : S. Abirami [Inde] ; D. Manjula [Inde]Source :
- International journal of computer applications in technology [ 0952-8091 ] ; 2010.
Descripteurs français
- Pascal (Inist)
- Reconnaissance forme, Reconnaissance caractère, Reconnaissance optique caractère, Analyse documentaire, Langage naturel, Texte, Traitement image, Grille, Arbre décision, Document imprimé, Multilinguisme, Bilinguisme, Classification hiérarchique, Alphabet, Modélisation, Evaluation performance, 52477.
- Wicri :
- topic : Multilinguisme, Bilinguisme.
English descriptors
- KwdEn :
Abstract
Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 000153
- to stream PascalFrancis, to step Curation: 000620
- to stream PascalFrancis, to step Checkpoint: 000149
- to stream Main, to step Merge: 000780
- to stream Main, to step Curation: 000775
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">11-0056451</idno>
<date when="2010">2010</date>
<idno type="stanalyst">PASCAL 11-0056451 INIST</idno>
<idno type="RBID">Pascal:11-0056451</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000153</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000620</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000149</idno>
<idno type="wicri:doubleKey">0952-8091:2010:Abirami S:local:features:based</idno>
<idno type="wicri:Area/Main/Merge">000780</idno>
<idno type="wicri:Area/Main/Curation">000775</idno>
<idno type="wicri:Area/Main/Exploration">000775</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Local features-based script recognition from printed bilingual document images</title>
<author><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, Anna University</s1>
<s2>Guindy, Chennai 600 025</s2>
<s3>IND</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Inde</country>
<wicri:noRegion>Guindy, Chennai 600 025</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
<imprint><date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal of computer applications in technology</title>
<title level="j" type="abbreviated">Int. j. comput. appl. technol.</title>
<idno type="ISSN">0952-8091</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Alphabet</term>
<term>Bilingualism</term>
<term>Character recognition</term>
<term>Decision tree</term>
<term>Document analysis</term>
<term>Grid</term>
<term>Hierarchical classification</term>
<term>Image processing</term>
<term>Modeling</term>
<term>Multilingualism</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Performance evaluation</term>
<term>Printed document</term>
<term>Text</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance forme</term>
<term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Analyse documentaire</term>
<term>Langage naturel</term>
<term>Texte</term>
<term>Traitement image</term>
<term>Grille</term>
<term>Arbre décision</term>
<term>Document imprimé</term>
<term>Multilinguisme</term>
<term>Bilinguisme</term>
<term>Classification hiérarchique</term>
<term>Alphabet</term>
<term>Modélisation</term>
<term>Evaluation performance</term>
<term>52477</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Multilinguisme</term>
<term>Bilinguisme</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Classification and identification of language in a biscript document is one of the important steps in the design of an OCR system for successful analysis and recognition. This paper presents architecture for script recognition of bilingual document images (Tamil, English), which specifically takes the challenges of recognition at character level by predicting the script of word image using its initial character, thereby adapting to various font faces and sizes. This recogniser models every character as Tetra bit values (TBV), which corresponds to the spatial spread, derived from the segmented grids of the character. We employed a decision tree classifier (DTC) for the classification of script on over the patterns generated from TBV. A spatial features-based script recogniser (SFBSR) is trained and tested with bilingual document images, consisting of various Tamil and English words, to show its effectiveness towards script identification. Classification accuracy in training and testing sets is promising. An evaluation of the system performance with various techniques shows a significant performance improvement in SFBSR. This can be embedded with OCR prior to its recognition stage.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
</country>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Abirami, S" sort="Abirami, S" uniqKey="Abirami S" first="S." last="Abirami">S. Abirami</name>
</noRegion>
<name sortKey="Manjula, D" sort="Manjula, D" uniqKey="Manjula D" first="D." last="Manjula">D. Manjula</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000775 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000775 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Ticri/CIDE |area= OcrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:11-0056451 |texte= Local features-based script recognition from printed bilingual document images }}
This area was generated with Dilib version V0.6.32. |